An Aging Small-Signal Model for Degradation Prediction of Microwave Heterojunction Bipolar Transistor S-Parameters Based on Prior Knowledge Neural Network

In this paper, an aging small-signal model for degradation prediction of microwave heterojunction bipolar transistor (HBT) S-parameters based on prior knowledge neural networks (PKNNs) is explored. A dual-extreme learning machine (D-ELM) structure with an adaptive genetic algorithm (AGA) optimization process is used to simulate the fresh S-parameters of InP HBT devices and the degradation of S-parameters after accelerated aging, respectively. In addition to the reliability parametric inputs of the original aging problem, the S-parameter degradation trend obtained from the aging small-signal equivalent circuit is used as additional information to inject into the D-ELM structure. Good agreement was achieved between measured and predicted results of the degradation of S-parameters within a frequency range of 0.1 to 40 GHz.


Introduction
With the rapid development of microelectronics technology, the increasing maturity of the integrated circuit process, and the continuous exploration of the depths of satellite communications, radar aerospace, and other fields, the application and development of RF semiconductor devices at microwave millimeter-wave frequencies have gradually occupied an important position in high-speed communications [1,2].In order to reduce the cost and improve the performance, some key dimensions of the process have been continuously reduced.Accordingly, the electric field strength and current density to which the device is subjected are also increasing, which exacerbates the probability and randomness of the occurrence of various failure mechanisms and poses a serious challenge to the reliability of integrated circuits and the improvement of electronic design automation (EDA) [3].
High-reliability EDA tools need to fit, cope with, and correct real-world physical and process problems in software with a systematic approach and predictive margins as high as possible and ultimately ensure that the circuit design simulation results are consistent with the degradation results of the flow test in the reliability environment to achieve the optimal end-of-life performance/power consumption/area (EOL-PPA).The device model, as an important bridge between process and circuit design, is a key link in the DTCO flow under the new architecture, and its accuracy directly determines the accuracy and efficiency of circuit design [4].Through the establishment of reliability models for key components in integrated systems, on the one hand, the internal physical mechanism of the device can be explored in depth, laying a foundation for the realization of the device reliability degradation simulation.On the other hand, a better prediction of the characteristic degradation can be obtained in the large-scale simulation of the electrical characteristics of circuits as well as reliability degradation effects.In addition to providing guidance for process improvement, device-level reliability models also play a key role in the reliability-aware design of integrated circuits.The establishment of efficient and accurate reliability models can be used to guide the reliability design and optimization of circuits in order to reduce the time and cost of high-reliability circuit design [5].
In recent years, research on the conventional small-signal model has focused on the characterization of non-ideal effects corresponding to ultra-high frequency rates and severe parasitics [6][7][8].The reliability modeling-related research on this basis has been insufficient, still focusing mainly on the physical level of the device, with constraints such as long R&D cycle time, slow modeling, and incompatibility with EDA softwares [9,10].A reliability compact model based on semi-empirical and semi-physical approaches may lack accuracy when actually characterizing degradation in time and frequency domains due to some approximations or simplifications [11,12].Therefore, in addition to the ongoing in-depth study of the physical nature of RF microwave transistor aging and the associated reliability of compact models, we believe that there is another approach worth exploring.This method involves the use of an artificial neural network (ANN) methodology based on machine learning-assisted behavioral-level modeling [13] to achieve degradation prediction of RF devices in harsh reliability environments.The method can be trained on the degradation behavior of the device under specific reliability conditions, and then the trained network can be used to make predictions.Although the data-driven reliability-based artificial neural network model can achieve higher accuracy, it strongly relies on a large number of reliability test samples for training and learning [14].Even if the TCAD numerical simulation tool is used to obtain the training data, a large amount of reliable experimental data is still needed to perform the complex calibration of the degradation of the amount of marching associated with aging as a time-correlated process [15].While microwave devices involve a variety of complex combinations of conditions such as materials, geometries, frequencies, processes, etc., reliability data are more difficult to obtain and costly, and reliability experiments require strict requirements on environment, equipment, samples, etc., which require a large amount of cost investment.It can be seen that improving the traditional reliability artificial neural network model with limited reliability test samples in order to achieve efficient, accurate, and high generalization ability modeling is particularly important.Therefore, we hope to use physical/semi-physical degradation laws or formulas to guide this behavioral model and adopt a simple and effective knowledge injection to improve the reliability of the black box model.
In this paper, a prior knowledge-based neural network approach for device reliability modeling is proposed.A dual-extreme learning machine (D-ELM) is used to decompose the aging and fresh features of the device into two sub-networks to simplify the complex nonlinear mapping of the original problem.The established degradation function of the key parameters of the device is introduced into the aging equivalent-circuit model to obtain the degradation of each component of the S-parameters.Then, the S-parameter degradation information obtained from the coarse model is injected into the degradation sub-network as prior knowledge.In addition to the reliability parameter inputs of the original aging problem, the coarse model outputs are also used as additional inputs.At this point, the input-output mapping to be learned by the neural network is the mapping between the existing approximate reliability model outputs and the original problem, thus weakening the complex mapping relationship between reliability parameters and S-parameter degradation and improving the quality of the mapping.This, in turn, reduces the dependence on reliable training samples and improves accuracy with the high generalization ability of the model.Moreover, we have utilized an adaptive evolutionary algorithm to optimize the model and improve the stability of network training.Finally, the proposed PKNN-based reliability model is validated by applying it to 0.7 µm InGaAs/InP HBTs.

Device Structure and Experiment
The InP DHBT device used in this paper adopts a three mesa-structure, and its structural schematic is shown in Figure 1, in which the non-alloyed Ti/Pt/Au and Pt/Ti/Pt/Au are used for N-type and P-type ohmic contacts, respectively.Base self-alignment technology, non-contact exposure, and wet etching technology are used in the fabrication of the device.

Device Structure and Experiment
The InP DHBT device used in this paper adopts a three mesa-structure, and its structural schematic is shown in Figure 1, in which the non-alloyed Ti/Pt/Au and Pt/Ti/Pt/Au are used for N-type and P-type ohmic contacts, respectively.Base self-alignment technology, non-contact exposure, and wet etching technology are used in the fabrication of the device.In this paper, we use a Keysight B1500A semiconductor device parameter analyzer to apply high-field electrical stress to the device by means of a DC probe.The B1500A has a very sophisticated cycling control module for DC probes to precisely apply the electrical stress, and this test module can also monitor whether the junction of the InP HBT device is broken during the test.The schematic diagram of high-field electrical stress application is shown in Figure 2, in which the base and collector of the device are connected to the SMU1 and SMU2 ports of Keysight B1500A, respectively.We set the two ports to voltage input mode and used the EasyEXPERT group+ 2015 software for the control of the electrical stress.The selection of stresses needs to be carried out before conducting the high-field stress aging test.Applying a reverse high-field stress of about 80% of the device breakdown voltage BVCBO to the BC junction can allow the device to operate properly without breakdown and cause non-negligible degradation [16].We have chosen about 70-90% of the breakdown voltage for high-field stress aging.The breakdown voltage of the InP HBT device used is about 4.7 V; thus, we apply constant reverse bias voltages of 3.4-4.3V to the BC junction of the device and select four stress biases in steps of 0.3 V to perform Sparameter tests after up to 300 minutes of stress application while keeping the BE junction open.To accurately assess the effect of high-field electrical stress on device RF characteristics, a Rohde & Schwarz ZVA 50 Network Analyzer controlled by IC-CAP 2018 software is employed for S-parameter testing, and the ground-signal-ground (GSG) microwave probe was used in the on-chip measuring process.In addition, to correct the error terms introduced by the test equipment itself, a system error calibration is required before testing with VNA.In this paper, the on-chip test of InP HBT S-parameters is a two-port network test system, and to obtain the actual S-parameters of the DUT, the TOSM (Through, Open, Short, Match) system error calibration method [17,18] is used to remove the parasitic elements introduced by the equipment components such as test cables and probe tips before the reliability test.It is worth noting that the calibration of the RF on-chip test platform is performed before the electrical stress is applied.Also, attention is paid to maintaining a stable test environment during the test to ensure that the calibration conditions do not deviate.All of the above on-chip tests are performed on the CASCADE MICRO-TECH Summit 11000 Prober, as shown in Figure 3.In this paper, we use a Keysight B1500A semiconductor device parameter analyzer to apply high-field electrical stress to the device by means of a DC probe.The B1500A has a very sophisticated cycling control module for DC probes to precisely apply the electrical stress, and this test module can also monitor whether the junction of the InP HBT device is broken during the test.The schematic diagram of high-field electrical stress application is shown in Figure 2, in which the base and collector of the device are connected to the SMU1 and SMU2 ports of Keysight B1500A, respectively.We set the two ports to voltage input mode and used the EasyEXPERT group+ 2015 software for the control of the electrical stress.The selection of stresses needs to be carried out before conducting the high-field stress aging test.Applying a reverse high-field stress of about 80% of the device breakdown voltage BV CBO to the BC junction can allow the device to operate properly without breakdown and cause non-negligible degradation [16].We have chosen about 70-90% of the breakdown voltage for high-field stress aging.The breakdown voltage of the InP HBT device used is about 4.7 V; thus, we apply constant reverse bias voltages of 3.4-4.3V to the BC junction of the device and select four stress biases in steps of 0.3 V to perform S-parameter tests after up to 300 minutes of stress application while keeping the BE junction open.To accurately assess the effect of high-field electrical stress on device RF characteristics, a Rohde & Schwarz ZVA 50 Network Analyzer controlled by IC-CAP 2018 software is employed for S-parameter testing, and the ground-signal-ground (GSG) microwave probe was used in the on-chip measuring process.In addition, to correct the error terms introduced by the test equipment itself, a system error calibration is required before testing with VNA.In this paper, the on-chip test of InP HBT S-parameters is a two-port network test system, and to obtain the actual S-parameters of the DUT, the TOSM (Through, Open, Short, Match) system error calibration method [17,18] is used to remove the parasitic elements introduced by the equipment components such as test cables and probe tips before the reliability test.It is worth noting that the calibration of the RF on-chip test platform is performed before the electrical stress is applied.Also, attention is paid to maintaining a stable test environment during the test to ensure that the calibration conditions do not deviate.All of the above on-chip tests are performed on the CASCADE MICROTECH Summit 11000 Prober, as shown in Figure 3.

Prior Knowledge Construction
For prior knowledge neural network-based aging modeling, the first step is that we need to obtain a priori aging knowledge in the form of physical/empirical formulas or equivalent circuits, which needs to represent the degradation tendency of the device characteristics of interest but not necessarily be very accurate or complete.In this paper, we complete the construction of the prior knowledge by modeling the HBT aging equivalent circuit.
In our previous study [19], the small-signal model with base-distributed capacitances applied to HBTs was proposed, as shown in Figure 4.The equivalent circuit model consists of three modules: parasitic elements, extrinsic distributed capacitance elements, and intrinsic model elements.The outermost part of the model is the parasitic module.Cpbe, Cpce, and Cpbc are the base-emitter, collector-emitter, and base-collector parasitic capacitances, respectively.Lb, Lc, and Le are the lead inductances associated with the base, collector, and emitter, respectively.The above pad parasitic parameters are extracted by the Open and Short Test Structure method described in [20].Rb, Rc, and Re are the series resistances associated with the base, collector, and emitter, respectively, which can be determined in the cut-off condition measurements [21].After peeling off the parasitic elements mentioned above, the periphery of the equivalent circuit features extrinsic distributed elements, in which Cbcx is the base-collector distributed capacitance and Cbex is the base-emitter distributed capacitance.Intrinsic model elements include the dynamic base resistance Rbi, the dynamic base-emitter resistance Rbe, the intrinsic base-emitter capacitance Cbe, the intrinsic base-collector capacitance Cbc, the DC transconductance Gm0, and the delay time τ.We use the peeling algorithm from our earlier work [19] to obtain the values of these extrinsic and intrinsic elements.

Prior Knowledge Construction
For prior knowledge neural network-based aging modeling, the first step is that we need to obtain a priori aging knowledge in the form of physical/empirical formulas or equivalent circuits, which needs to represent the degradation tendency of the device characteristics of interest but not necessarily be very accurate or complete.In this paper, we complete the construction of the prior knowledge by modeling the HBT aging equivalent circuit.
In our previous study [19], the small-signal model with base-distributed capacitances applied to HBTs was proposed, as shown in Figure 4.The equivalent circuit model consists of three modules: parasitic elements, extrinsic distributed capacitance elements, and intrinsic model elements.The outermost part of the model is the parasitic module.Cpbe, Cpce, and Cpbc are the base-emitter, collector-emitter, and base-collector parasitic capacitances, respectively.Lb, Lc, and Le are the lead inductances associated with the base, collector, and emitter, respectively.The above pad parasitic parameters are extracted by the Open and Short Test Structure method described in [20].Rb, Rc, and Re are the series resistances associated with the base, collector, and emitter, respectively, which can be determined in the cut-off condition measurements [21].After peeling off the parasitic elements mentioned above, the periphery of the equivalent circuit features extrinsic distributed elements, in which Cbcx is the base-collector distributed capacitance and Cbex is the base-emitter distributed capacitance.Intrinsic model elements include the dynamic base resistance Rbi, the dynamic base-emitter resistance Rbe, the intrinsic base-emitter capacitance Cbe, the intrinsic base-collector capacitance Cbc, the DC transconductance Gm0, and the delay time τ.We use the peeling algorithm from our earlier work [19] to obtain the values of these extrinsic and intrinsic elements.

Prior Knowledge Construction
For prior knowledge neural network-based aging modeling, the first step is that we need to obtain a priori aging knowledge in the form of physical/empirical formulas or equivalent circuits, which needs to represent the degradation tendency of the device characteristics of interest but not necessarily be very accurate or complete.In this paper, we complete the construction of the prior knowledge by modeling the HBT aging equivalent circuit.
In our previous study [19], the small-signal model with base-distributed capacitances applied to HBTs was proposed, as shown in Figure 4.The equivalent circuit model consists of three modules: parasitic elements, extrinsic distributed capacitance elements, and intrinsic model elements.The outermost part of the model is the parasitic module.C pbe , C pce , and C pbc are the base-emitter, collector-emitter, and base-collector parasitic capacitances, respectively.L b , L c , and L e are the lead inductances associated with the base, collector, and emitter, respectively.The above pad parasitic parameters are extracted by the Open and Short Test Structure method described in [20].R b , R c , and R e are the series resistances associated with the base, collector, and emitter, respectively, which can be determined in the cut-off condition measurements [21].After peeling off the parasitic elements mentioned above, the periphery of the equivalent circuit features extrinsic distributed elements, in which C bcx is the base-collector distributed capacitance and C bex is the base-emitter distributed capacitance.Intrinsic model elements include the dynamic base resistance R bi , the dynamic base-emitter resistance R be , the intrinsic base-emitter capacitance C be , the intrinsic base-collector capacitance C bc , the DC transconductance G m0 , and the delay time τ.We use the peeling algorithm from our earlier work [19] to obtain the values of these extrinsic and intrinsic elements.
The aging reliability test data obtained from different batches of InP HBT devices under different stress times and stress magnitudes are characterized by the above-mentioned small-signal model topology and parameter extraction to determine the key model parameters affecting the characterization of degradation of HBT devices.Inserting a parameter ΔP related to the variation in the above key parameters due to aging effects, ΔP is defined as the difference between the key model parameters after and before degradation, which can be extracted experimentally by its variation with different stress time and stress magnitude conditions.Thus, the aging Equation ( 1) that can describe the degradation trend of the key parameters of the model is given.For the function establishment of the degradation amount parameter ΔP, we use a double exponential type function for modeling, as shown in Equation ( 2 where Paging is the value of key model parameters after degradation, ΔP is the amount of degradation of key model parameters, and Pinitial is the value of key parameters in the fresh state.AO is the accelerated degradation saturation factor, a is the degradation acceleration factor, and µ is the degradation acceleration index factor.The degradation equations for the above key parameters are substituted back to the model topology shown in Figure 4 to obtain the degradation trend of the S-parameters of the device, and we use this rough degradation law as a priori degradation knowledge to improve the quality of machine learning-assisted aging modeling mapping.

Dual-Extreme Learning Machine(D-ELM) Structure
ELM neural network is usually a single-hidden layer feedforward neural network.It can randomly generate the weight between the input layer and the hidden layer and the bias of the neurons in the hidden layer, and there is no need to adjust them during the training process.Only the number of neurons in the hidden layer needs to be set to obtain the unique optimal solution [22].
The above traditional MLP structure for aging modeling is shown in Figure 5, which requires a large amount of degradation measurement data to construct the training set and more training time to learn the nonlinear relationships of the original problem to achieve the desired modeling accuracy.For the aging prediction of devices, therefore, we propose a D-ELM architecture, as shown in Figure 6, which decomposes the aging features and fresh features of devices into two sub-networks to simplify the complex nonlinear mapping of the original problem.NETWORK1 and NETWORK2 in Figure 6 have a common frequency input (Freq) and operating bias inputs (VCE, Ib), and NETWORK2 has The aging reliability test data obtained from different batches of InP HBT devices under different stress times and stress magnitudes are characterized by the above-mentioned small-signal model topology and parameter extraction to determine the key model parameters affecting the characterization of degradation of HBT devices.Inserting a parameter ∆P related to the variation in the above key parameters due to aging effects, ∆P is defined as the difference between the key model parameters after and before degradation, which can be extracted experimentally by its variation with different stress time and stress magnitude conditions.Thus, the aging Equation ( 1) that can describe the degradation trend of the key parameters of the model is given.For the function establishment of the degradation amount parameter ∆P, we use a double exponential type function for modeling, as shown in Equation ( 2).
P aging (V CB,stress , t) = ∆P(V CB,stress , t) + P initial ( 1) where P aging is the value of key model parameters after degradation, ∆P is the amount of degradation of key model parameters, and P initial is the value of key parameters in the fresh state.A O is the accelerated degradation saturation factor, a is the degradation acceleration factor, and µ is the degradation acceleration index factor.The degradation equations for the above key parameters are substituted back to the model topology shown in Figure 4 to obtain the degradation trend of the S-parameters of the device, and we use this rough degradation law as a priori degradation knowledge to improve the quality of machine learning-assisted aging modeling mapping.

Dual-Extreme Learning Machine(D-ELM) Structure
ELM neural network is usually a single-hidden layer feedforward neural network.It can randomly generate the weight between the input layer and the hidden layer and the bias of the neurons in the hidden layer, and there is no need to adjust them during the training process.Only the number of neurons in the hidden layer needs to be set to obtain the unique optimal solution [22].
The above traditional MLP structure for aging modeling is shown in Figure 5, which requires a large amount of degradation measurement data to construct the training set and more training time to learn the nonlinear relationships of the original problem to achieve the desired modeling accuracy.For the aging prediction of devices, therefore, we propose a D-ELM architecture, as shown in Figure 6, which decomposes the aging features and fresh features of devices into two sub-networks to simplify the complex nonlinear mapping of the original problem.NETWORK1 and NETWORK2 in Figure 6 have a common frequency input (Freq) and operating bias inputs (V CE , I b ), and NETWORK2 has additional reliability parameter inputs (V CB,stress , t stress ).Thus, we use NETWORK1 and NETWORK2 to characterize the fresh S-parameters and the degradation of S-parameters in HBTs, respectively.Then, the two sets of results are summed to obtain the aging S-parameters.
additional reliability parameter inputs (VCB,stress, tstress).Thus, we use NETWORK1 and NETWORK2 to characterize the fresh S-parameters and the degradation of S-parameters in HBTs, respectively.Then, the two sets of results are summed to obtain the aging Sparameters.In addition, we give a comparison of the time consumption and accuracy of the two structures in Table 1 with the same amount of data (1340 sample sets) and the percentage of the training set (75%) for the same problem.It is evident that the proposed D-ELM architecture has a much faster training speed and modeling accuracy; it saves 17.75% training time, and the accuracy is improved by about two times compared to the traditional MLP structure.

PKNN-Based Aging Modeling Method
The D-ELM structure we proposed in Section 3.2 is a pure black-box model whose structure does not contain any relevant knowledge of the problem to be solved, and only additional reliability parameter inputs (VCB,stress, tstress).Thus, we use NETWORK1 and NETWORK2 to characterize the fresh S-parameters and the degradation of S-parameters in HBTs, respectively.Then, the two sets of results are summed to obtain the aging Sparameters.In addition, we give a comparison of the time consumption and accuracy of the two structures in Table 1 with the same amount of data (1340 sample sets) and the percentage of the training set (75%) for the same problem.It is evident that the proposed D-ELM architecture has a much faster training speed and modeling accuracy; it saves 17.75% training time, and the accuracy is improved by about two times compared to the traditional MLP structure.

PKNN-Based Aging Modeling Method
The D-ELM structure we proposed in Section 3.2 is a pure black-box model whose structure does not contain any relevant knowledge of the problem to be solved, and only In addition, we give a comparison of the time consumption and accuracy of the two structures in Table 1 with the same amount of data (1340 sample sets) and the percentage of the training set (75%) for the same problem.It is evident that the proposed D-ELM architecture has a much faster training speed and modeling accuracy; it saves 17.75% training time, and the accuracy is improved by about two times compared to the traditional MLP structure.

PKNN-Based Aging Modeling Method
The D-ELM structure we proposed in Section 3.2 is a pure black-box model whose structure does not contain any relevant knowledge of the problem to be solved, and only the training data determine the structure of the network input-output mapping, so a large number of dataset sources are required to ensure the validity of the model.
Therefore, to link the prior knowledge constructed in Section 3.1, we further improve the D-ELM structure.For the degradation network (NETWORK2), we construct the degradation formulas of the key parameters and the aging equivalent circuit as the prior knowledge module to be injected to obtain the prior knowledge-based degradation network (NETWORK2').Subsequently, we employ this network in conjunction with NETWORK1 to simulate the S-parameters of the device in the fresh state to obtain the knowledge-based reliability neural network model, whose modeling framework is shown in Figure 7.The mathematical derivation process is shown in the following equations: Micromachines 2023, 14, x FOR PEER REVIEW 7 the training data determine the structure of the network input-output mapping, so a la number of dataset sources are required to ensure the validity of the model.Therefore, to link the prior knowledge constructed in Section 3.1, we further impr the D-ELM structure.For the degradation network (NETWORK2), we construct the d radation formulas of the key parameters and the aging equivalent circuit as the p knowledge module to be injected to obtain the prior knowledge-based degradation work (NETWORK2').Subsequently, we employ this network in conjunction with N WORK1 to simulate the S-parameters of the device in the fresh state to obtain knowledge-based reliability neural network model, whose modeling framework is sho in Figure 7.The mathematical derivation process is shown in the following equations For NETWORK1 and NETWORK2', their input vectors are shown in Equation and (4), respectively: T q q c e , q b , q , , , (q 1,2, ,T1) q ce,q b,q CB,stress,q stress,q ij q ij q , , , , ,Re( ) ,Im( ) , (q 1, ,T2) where N1 and N2 represent NETWORK1 and NETWORK2', respectively.T1 and T2 the number of training set samples for NETWORK1 and NETWORK2', respectively.is the output of the coarse model degradation obtained from the aging equivalent cir The corresponding training target matrices of the two networks are shown in Equat ( 5) and ( 6) below: where Sij init is the value of S-parameters in the fresh state.Sij degr is the value of S-parame after degradation.For NETWORK1 and NETWORK2', their input vectors are shown in Equations ( 3) and ( 4), respectively: X N2 q = Freq q , V ce,q , I b,q , V CB,stress,q , t stress,q , Re(∆S ij ) where N1 and N2 represent NETWORK1 and NETWORK2', respectively.T1 and T2 are the number of training set samples for NETWORK1 and NETWORK2', respectively.∆S ij is the output of the coarse model degradation obtained from the aging equivalent circuit.The corresponding training target matrices of the two networks are shown in Equations ( 5) and ( 6) below: where S ij init is the value of S-parameters in the fresh state.S ij degr is the value of S-parameters after degradation.
In addition, the connection weight vectors between the input and hidden layers of NETWORK1 and NETWORK2' are shown in Equations ( 7) and (8), and the bias vectors of the hidden neurons are shown in Equations ( 9) and (10).They can be randomly assigned before training and remain unchanged during the training process.
where H1 and H2 represent the number of hidden layer neurons in NETWORK1 and NETWORK2', respectively.The transfer function of the hidden layer neuron is where γ is the weighted sum of all inputs to the neuron.After all the functional variables in the neural network algorithm are defined, we can obtain the hidden layer output matrix as shown in Equations ( 12) and (13).
In turn, the output matrices of the two networks are obtained separately: ∧ where β N1 and β N2 are the connection weight matrices between the hidden and output layers to be solved during the neural network training, respectively.The optimal weight between the hidden and output layers can be found by obtaining the Moore-Penrose generalized inverse [23] of the output matrix of the hidden layer to complete the training of the ELM network.In summary, we can obtain the objective function of the proposed PKNN-based D-ELM network training as shown in Equation ( 16), and its least squares solution can be obtained from Equation (17).

min(||
with At this point, the network structure, corresponding neuron weights, and bias of the networks are determined.In addition to the reliability parametric inputs of the original problem, the S-parameter degradation information obtained from the aging smallsignal equivalent circuit is used as an additional input to NETWORK2' in the constructed knowledge-based reliability neural network model.Then, the input-output mapping to be learned by the neural network is the mapping between the output of the existing approximate reliability model and the original problem, which provides the degradation knowledge to weaken the strong nonlinear relationship between the reliability coefficients and the degradation of S-parameters and improve the quality of the mapping.This, in turn, serves to reduce the training dataset requirement and improve the network generalization capability and modeling efficiency.

Model Optimization
For traditional ELM networks, random selectivity makes the accuracy and convergence of neural networks relatively unstable.Typically, a network needs to be initialized many times to try out the optimal value.The adaptive genetic algorithm (AGA) is an evolutionary algorithm, a method to search for the optimal solution by simulating the natural evolutionary process [24].Therefore, we use the AGA to automatically optimize the input weights and hidden layer bias of the networks to avoid the effects of random initialization.This optimization process is described in the flowchart shown in Figure 8. First, we need to prepare the training sample sets of input and output data with measured fresh/degraded S-parameters for NETWORK1 and NETWORK2'.After normalizing the above training data, the connection weight between the input and hidden layers and the bias of the neurons in the hidden layer are randomly initialized in the D-ELM network shown in Figure 6.Since the AGA cannot directly deal with the parameters of the problem space, they need to be represented as individuals in the genetic space by coding.A fitness evaluation is then performed for each individual.The fitness function is expressed by Equation (18).
where y mea and y pre represent the measured output and the predicted output of the neural network, respectively.N is the total number of training data.After the fitness evaluation, 15% of individuals in the paternal generation will be rejected, and the remaining individuals will undergo reorganization and mutation to obtain offspring.The corresponding reorganization probabilities px and mutation probabilities pm are set to be adaptive, as shown in Equations ( 19) and ( 20): After the fitness evaluation, 15% of individuals in the paternal generation will be rejected, and the remaining individuals will undergo reorganization and mutation to obtain offspring.The corresponding reorganization probabilities p x and mutation probabilities p m are set to be adaptive, as shown in Equations ( 19) and (20): where f max is the maximum fitness in the population.f avg is the average fitness of the population.f max -f avg reflects the convergence state of optimization.f is the larger fitness of the two individuals to be reorganized.f is the fitness of the individual to be mutated.k 1 k 2 , k 3, and k 4 are constants less than 1.
For high-quality individuals with high fitness, p x and p m are reduced to preserve them; for low-quality individuals with low fitness, p x and p m are increased to eliminate them.In addition, in the early stage of iterative optimization, the population needs a large reorganization and mutation probability to achieve a fast search for the optimal solution, while in the late stage of convergence, the population needs a small reorganization and mutation probability to enable the population to converge quickly after searching for the optimal solution.Then, the superior individuals in the offspring were reinserted into the paternal generation to form a new population.The above process will not stop until the number of training cycles satisfies the number of evolutionary generations set in the experiment.Finally, we decode to obtain the optimal values of the weight between the input layer and the hidden layer and the bias of the hidden neurons.At this point, we can obtain the trained AGA-D-ELM neural network model to predict the aging S-parameters.

Results and Discussion
In order to more intuitively represent the modeling process in this paper, we use the PKNN-based aging small-signal modeling flowchart shown in Figure 9 to briefly summarize the above steps of device small-signal aging knowledge construction, knowledge injection, D-ELM network training, and network optimization.In the first stage, we use NETWORK1 to train and predict the S-parameter samples in the fresh state of the device.Then, we transform the degradation trend of the key parameters into the degradation information of the S-parameters after applying stress to the device through the aging compact model in the second stage.In the third stage, we inject the rough aging information obtained in the second stage into the degradation NETWORK2, then train and optimize this network.Finally, the aging S-parameters of the device are obtained by accumulating the predicted outputs of the two networks under the corresponding degradation conditions.The proposed aging modeling technique is evaluated and validated by modeling the aging S-parameters of different batches of InP HBTs degraded under different stress magnitudes and times in the frequency range of 0.1-40 GHz.
Trained artificial neural network models are usually evaluated using the prediction error of the test sample sets.For the reliability of the ANN model in this paper, the main factors affecting its network training in terms of dataset sample requirements are the total reliability sample size and the percentage of the training sample sets.Therefore, we set up seven groups of training experiments, as summarized in Table 2, to verify the validity and accuracy of the proposed model.The first, second, and third groups keep the same total sample size (2412 sets) and 50% training set share.These three experiments were conducted to model the degradation of each component of the S-parameters using the traditional ELM structure and the proposed PKNN structure, respectively, to compare the degradation prediction ability of the two networks under the same sample size and the same training conditions with different amounts of knowledge injection.Additionally, to further illustrate the superiority of the aging modeling approach proposed in the paper, keeping the same conditions as the first three sets of experiments, the widely used backpropagation (BP) and radial basis function (RBF) algorithms are applied for additional comparison in the fourth and fifth groups, respectively.Then, we set up a sixth set of experiments, still keeping the same total sample size (2412 sets) as the first five sets of experiments but randomly selecting 80% of the total sample size to train the network on the traditional ELM structure.This was conducted to compare the results with the second and third sets of experiments on prediction with the proposed PKNN structure to verify the prediction ability of the proposed model after the dependence on the reliability training samples is reduced.Finally, we supplemented a seventh set of experiments by taking about 60% (1458 sets) of the total sample size of the first six sets, with the set share remaining the same as that of the sixth set.Then, we compared it with the prediction results of the ELM structure in the sixth set of experiments to further compare and illustrate the prediction ability of the proposed PKNN model with the reduction in the total reliability sample size.In addition, after keeping the same total sample size and fixing the training set share for the third and sixth sets of experiments, the test sets they used for validation also naturally formed a set of control groups with a share of 50% and 20%, respectively.The results of the comparison between these two sets of experiments can be used to further illustrate how well the generalization ability of both the traditional ELM structure and the proposed PKNN structure is.In Table 3, we give the mean relative errors of the prediction of the degradation of each component of the S-parameters in the seven sets of experimental test sets, and their error equations are shown below.
where y mea and y pre represent the measured output and the predicted output of the neural network, respectively.N is the number of data.

Results and Discussion
In order to more intuitively represent the modeling process in this paper, we use the PKNN-based aging small-signal modeling flowchart shown in Figure 9 to briefly summarize the above steps of device small-signal aging knowledge construction, knowledge injection, D-ELM network training, and network optimization.In the first stage, we use NETWORK1 to train and predict the S-parameter samples in the fresh state of the device.Then, we transform the degradation trend of the key parameters into the degradation information of the S-parameters after applying stress to the device through the aging compact model in the second stage.In the third stage, we inject the rough aging information obtained in the second stage into the degradation NETWORK2, then train and optimize this network.Finally, the aging S-parameters of the device are obtained by accumulating the predicted outputs of the two networks under the corresponding degradation conditions.The proposed aging modeling technique is evaluated and validated by modeling the aging S-parameters of different batches of InP HBTs degraded under different stress magnitudes and times in the frequency range of 0.1-40 GHz.Trained artificial neural network models are usually evaluated using the prediction error of the test sample sets.For the reliability of the ANN model in this paper, the main factors affecting its network training in terms of dataset sample requirements are the total reliability sample size and the percentage of the training sample sets.Therefore, we set up seven groups of training experiments, as summarized in Table 2, to verify the validity and  From the comparison of the prediction results between the first and third sets of experiments, the overall prediction accuracy of the PKNN model for the degradation of each component of the S-parameter is better than that of the ELM structure.The improvement is at least 51.6% for each component and a reduction of 74.3% in the overall error for the same sample size and training configuration.For the fourth and fifth groups of BP and RBF algorithm networks, the overall prediction accuracy of PKNN is about three times higher than theirs.From the comparison of the prediction results between the third and sixth sets of experiments, compared with the aging prediction of the ELM structure without knowledge injection, the PKNN-based aging model can meet or even exceed its prediction accuracy for all the components of the S-parameters with at least a 37.5% reduction in the amount of training data.Furthermore, for the second set of experiments with only 50% knowledge injection, although the accuracy of the PKNN model decreased compared to the third set, the total prediction accuracy still improved by 11.4% compared to the sixth set.Moreover, the size of the test samples to be validated for the PKNN model in the third set of experiments (1206 sets) is about 2.5 times larger than that of the ELM model in the sixth set of experiments (482 sets), while its overall error has been reduced by 57.2%, which is evident in the significant improvement of its generalization ability.From the comparison of the prediction results between the sixth and seventh sets of experiments, the proposed PKNN model reduces its overall error by 63.2%, with the total reliability sample size reduced by at least 39.6% while controlling the same occupancy of the training set, further demonstrating the superiority of the proposed model in terms of reliability sample dependency while guaranteeing accuracy.

Randomly
In addition, we have further compared the third and sixth sets of experiments, which have been verified to have improved prediction accuracy and generalization ability in terms of the efficiency and convergence of the network training optimization.In this paper, for each set of training samples, the input layer of NETWORK1, which describes the S-parameters of the fresh state of the device, has three input neurons, and the input layers of NETWORK2 and NETWORK2', which describe the amount of degradation of the S-parameters of the device, have five and thirteen input neurons, respectively.The D-ELM structure is a single hidden layer structure with 120 hidden neurons.Therefore, the number of individuals to be optimized in the AGA optimization process is 480 for the population of NETWORK1 and 720 and 1680 for the population of NETWORK2 and NETWORK2', respectively.The stopping criterion is set to a maximum of 1000 generations.The fitness evolution curve of NETWORK1 during the training iterations is shown in Figure 10a.The total optimization time is 1956, and its fitness value is optimized by 17.2%.For NETWORK2 using the ELM structure and NETWORK2' using the PKNN structure, their fitness evolution curves are shown in Figure 10b,c.The optimized convergence duration is 4939s and 2173s, respectively, and the convergence speed is accelerated by about 56.0%, and their fitness values are optimized by 34.6% and 30.8%, respectively.The results of the above networks in terms of efficiency and convergence of training and optimization are summarized in Table 4, from which it can also be seen that the overall fitness range of the PKNN aging model for its population during optimization iterations is about three times higher than that of the traditional ELM network.The high superiority of individuals throughout the optimization period further illustrates the advantages of the proposed model in terms of simulation accuracy.Using the prior knowledge-based AGA-D-ELM network developed above, we predict the S-parameters of InP DHBT devices in the fresh state and the degradation of the Sparameters under different stress magnitudes and stress times.A comparison of the simulations of NETWORK1 and the conventional MLP network for the fresh S-parameters is given in Figure 11, which further illustrates the accuracy of the predictions of the D-ELM structure.As shown in Figures 12a and 13a, we observed the performance of different batches of 0.7 × 13 µm 2 InP DHBT devices under the bias of (VCE = 1.6 V, Ib = 200 µA) and subjected to two stress conditions: (VCB,stress = 3.4 V, tstress = 250 min) and (VCB,stress = 4.3 V, tstress = 310 min).The model predictions of the degradation of the real and imaginary parts of each component of S-parameters are compared with the corresponding test results using the NETWORK2' and NETWORK2 trained in the second and third sets of experiments, respectively.In addition, to further complement the predictive effectiveness of the proposed model, we present the simulation results of the model in Figures 12b and 13b for the magnitude and phase of the degradation of each S-parameter component in the Advanced Design System (ADS).The simulation accuracy of NETWORK2' on the degradation of S-parameters is still able to reach or even be better than that of NETWORK2 after the dependence on reliability training samples is reduced.It can be seen that the proposed PKNN network structure is able to predict the degradation trend of the S-parameters of the HBT devices after the accelerated aging well, which provides guidance for the establishment of a highly efficient and accurate aging model for microwave devices in practical applications.Using the prior knowledge-based AGA-D-ELM network developed above, we predict the S-parameters of InP DHBT devices in the fresh state and the degradation of the Sparameters under different stress magnitudes and stress times.A comparison of the simulations of NETWORK1 and the conventional MLP network for the fresh S-parameters is given in Figure 11, which further illustrates the accuracy of the predictions of the D-ELM structure.As shown in Figures 12a and 13a, we observed the performance of different batches of 0.7 × 13 µm 2 InP DHBT devices under the bias of (V CE = 1.6 V, I b = 200 µA) and subjected to two stress conditions: (V CB,stress = 3.4 V, t stress = 250 min) and (V CB,stress = 4.3 V, t stress = 310 min).The model predictions of the degradation of the real and imaginary parts of each component of S-parameters are compared with the corresponding test results using the NETWORK2' and NETWORK2 trained in the second and third sets of experiments, respectively.In addition, to further complement the predictive effectiveness of the proposed model, we present the simulation results of the model in Figures 12b and 13b for the magnitude and phase of the degradation of each S-parameter component in the Advanced Design System (ADS).The simulation accuracy of NETWORK2' on the degradation of S-parameters is still able to reach or even be better than that of NETWORK2 after the dependence on reliability training samples is reduced.It can be seen that the proposed PKNN network structure is able to predict the degradation trend of the S-parameters of the HBT devices after the accelerated aging well, which provides guidance for the establishment of a highly efficient and accurate aging model for microwave devices in practical applications.
ulations of NETWORK1 and the conventional MLP network for the fresh S-parameters is given in Figure 11, which further illustrates the accuracy of the predictions of the D-ELM structure.As shown in Figures 12a and 13a, we observed the performance of different batches of 0.7 × 13 µm 2 InP DHBT devices under the bias of (VCE = 1.6 V, Ib = 200 µA) and subjected to two stress conditions: (VCB,stress = 3.4 V, tstress = 250 min) and (VCB,stress = 4.3 V, tstress = 310 min).The model predictions of the degradation of the real and imaginary parts of each component of S-parameters are compared with the corresponding test results using the NETWORK2' and NETWORK2 trained in the second and third sets of experiments, respectively.In addition, to further complement the predictive effectiveness of the proposed model, we present the simulation results of the model in Figures 12b and 13b for the magnitude and phase of the degradation of each S-parameter component in the Advanced Design System (ADS).The simulation accuracy of NETWORK2' on the degradation of S-parameters is still able to reach or even be better than that of NETWORK2 after the dependence on reliability training samples is reduced.It can be seen that the proposed PKNN network structure is able to predict the degradation trend of the S-parameters of the HBT devices after the accelerated aging well, which provides guidance for the establishment of a highly efficient and accurate aging model for microwave devices in practical applications.

Conclusions
A prior knowledge-based aging small-signal neural network model is proposed for the degradation prediction of S-parameters of microwave HBTs.The adopted dual-extreme learning machine architecture decomposes the aging features and the fresh features of the device into two sub-networks, respectively, to simplify the complex nonlinear mapping of the original problem.Moreover, we transform the degradation trend of the device key parameters into the degradation information of the S-parameters after the device is stressed by the aging compact model.Compared with the traditional MLP degradation sub-network architecture, the proposed knowledge-based reliability neural network model incorporates all this degradation information.This integration improves prediction accuracy with reduced reliability training sample dependency, as well as improving the generalization ability of the network.The superiority of the convergence efficiency of the proposed model in the optimization process is further verified by introducing an adaptive genetic algorithm to automatically optimize the input weights and hidden layer biases in the network to circumvent the instability caused by random initialization.Finally, the

Conclusions
A prior knowledge-based aging small-signal neural network model is proposed for the degradation prediction of S-parameters of microwave HBTs.The adopted dual-extreme learning machine architecture decomposes the aging features and the fresh features of the device into two sub-networks, respectively, to simplify the complex nonlinear mapping of the original problem.Moreover, we transform the degradation trend of the device key parameters into the degradation information of the S-parameters after the device is stressed by the aging compact model.Compared with the traditional MLP degradation sub-network architecture, the proposed knowledge-based reliability neural network model incorporates all this degradation information.This integration improves prediction accuracy with reduced reliability training sample dependency, as well as improving the generalization ability of the network.The superiority of the convergence efficiency of the proposed model in the optimization process is further verified by introducing an adaptive genetic algorithm to automatically optimize the input weights and hidden layer biases in the network to circumvent the instability caused by random initialization.Finally, the validity and accuracy of the proposed PKNN-based aging model are verified by comparing the predicted and measured results of the S-parameter degradation of different batches of 0.7 × 13 µm 2 InGaAs/InP DHBTs in the frequency range of 0.1-40 GHz.

Figure 2 .
Figure 2. Diagram of electrical stress application.

Figure 2 .
Figure 2. Diagram of electrical stress application.

Figure 2 .
Figure 2. Diagram of electrical stress application.

Figure 6 .
Figure 6.D-ELM neural network structure for aging S-parameter modeling.

Figure 9 .
Figure 9. Flowchart of the proposed PKNN-based aging modeling method.

Figure 9 .
Figure 9. Flowchart of the proposed PKNN-based aging modeling method.

Figure 11 .
Figure 11.Comparison between predicted and measured results of the S-parameters in the fresh state from 0.1 to 40 GHz.

Figure 11 .Figure 12 .Figure 12 .
Figure 11.Comparison between predicted and measured results of the S-parameters in the fresh state from 0.1 to 40 GHz.

Figure 12 .Figure 13 .
Figure 12.Comparison between predicted and measured results of (a) real and imaginary parts and (b) magnitude and phase of degradation of S-parameters from 0.1 to 40 GHz at VCB,stress = 3.4 V, tstress = 250 min.

Figure 13 .
Figure 13.Comparison between predicted and measured results of (a) real and imaginary parts and (b) magnitude and phase of degradation of S-parameters from 0.1 to 40 GHz at V CB,stress = 4.3 V, t stress = 310 min.
Micromachines 2023, 14, x FOR PEER REVIEW 3 of 17 optimize the model and improve the stability of network training.Finally, the proposed PKNN-based reliability model is validated by applying it to 0.7 µm InGaAs/InP HBTs. ).

Table 1 .
Comparison of time consumption and accuracy of MLP and D-ELM structures.

Table 1 .
Comparison of time consumption and accuracy of MLP and D-ELM structures.

Table 1 .
Comparison of time consumption and accuracy of MLP and D-ELM structures.

Table 2 .
Experimental setup of the dataset for training and validation.

Table 3 .
Comparison of prediction accuracy for seven groups of training experimental settings.

Table 4 .
Comparison of optimization efficiency and convergence of the networks.